The Expectation Maximization Algorithm
نویسنده
چکیده
This note represents my attempt at explaining the EM algorithm (Hartley, 1958; Dempster et al., 1977; McLachlan and Krishnan, 1997). This is just a slight variation on TomMinka’s tutorial (Minka, 1998), perhaps a little easier (or perhaps not). It includes a graphical example to provide some intuition. 1 Intuitive Explanation of EM EM is an iterative optimizationmethod to estimate some unknown parametersΘ, given measurement dataU. However, we are not given some “hidden” nuisance variables J, which need to be integrated out. In particular, we want to maximize the posterior probability of the parametersΘ given the dataU, marginalizing over J: Θ∗ = argmax Θ ∑ J∈Jn P (Θ,J|U) (1) The intuition behind EM is an old one: alternate between estimating the unknowns Θ and the hidden variables J. This idea has been around for a long time. However, instead of finding the best J ∈ J given an estimateΘ at each iteration, EM computes a distribution over the space J . One of the earliest papers on EM is (Hartley, 1958), but the seminal reference that formalized EM and provided a proof of convergence is the “DLR” paper by Dempster, Laird, and Rubin (Dempster et al., 1977). A recent book devoted entirely to EM and applications is (McLachlan and Krishnan, 1997), whereas (Tanner, 1996) is another popular and very useful reference. One of the most insightful explanations of EM, that provides a deeper understanding of its operation than the intuition of alternating between variables, is in terms of lowerbound maximization (Neal and Hinton, 1998; Minka, 1998). In this derivation, the E-step can be interpreted as constructing a local lower-bound to the posterior distribution, whereas the M-step optimizes the bound, thereby improving the estimate for the unknowns. This is demonstrated below for a simple example. 1 −5 −4 −3 −2 −1 0 1 2 3 4 5 −0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Figure 1: EM example: Mixture components and data. The data consists of three samples drawn from each mixture component, shown above as circles and triangles. The means of the mixture components are −2 and 2, respectively.
منابع مشابه
Quantitative SPECT and planar 32P bremsstrahlung imaging for dosimetry purpose –An experimental phantom study
Background: In this study, Quantitative 32P bremsstrahlung planar and SPECT imaging and consequent dose assessment were carried out as a comprehensive phantom study to define an appropriate method for accurate Dosimetry in clinical practice. Materials and Methods: CT, planar and SPECT bremsstrahlung images of Jaszczak phantom containing a known activity of 32P were acquired. In addition, Phanto...
متن کاملThe Development of Maximum Likelihood Estimation Approaches for Adaptive Estimation of Free Speed and Critical Density in Vehicle Freeways
The performance of many traffic control strategies depends on how much the traffic flow models have been accurately calibrated. One of the most applicable traffic flow model in traffic control and management is LWR or METANET model. Practically, key parameters in LWR model, including free flow speed and critical density, are parameterized using flow and speed measurements gathered by inductive ...
متن کاملAn Explanation of the Expectation Maximization Algorithm, Report no. LiTH-ISY-R-2915
The expectation maximization (EM) algorithm computes maximum likelihood estimates of unknown parameters in probabilistic models involving latent variables. More pragmatically speaking, the EM algorithm is an iterative method that alternates between computing a conditional expectation and solving a maximization problem, hence the name expectation maximization. We will in this work derive the EM ...
متن کاملAn Improved EM algorithm
In this paper, we firstly give a brief introduction of expectation maximization (EM) algorithm, and then discuss the initial value sensitivity of expectation maximization algorithm. Subsequently, we give a short proof of EM's convergence. Then, we implement experiments with the expectation maximization algorithm (We implement all the experiments on Gaussion mixture model (GMM) ). Our experiment...
متن کاملThe Development of Maximum Likelihood Estimation Approaches for Adaptive Estimation of Free Speed and Critical Density in Vehicle Freeways
The performance of many traffic control strategies depends on how much the traffic flow models are accurately calibrated. One of the most applicable traffic flow model in traffic control and management is LWR or METANET model. Practically, key parameters in LWR model, including free flow speed and critical density, are parameterized using flow and speed measurements gathered by inductive loop d...
متن کاملBayesian K-Means as a “Maximization-Expectation” Algorithm
We introduce a new class of “maximization expectation” (ME) algorithms where we maximize over hidden variables but marginalize over random parameters. This reverses the roles of expectation and maximization in the classical EM algorithm. In the context of clustering, we argue that these hard assignments open the door to very fast implementations based on data-structures such as kdtrees and cong...
متن کامل